SearchableSnapshotsIntegTests should wait for shard index folder to be cleaned up #80341

tlrx · 2021-11-04T13:03:25Z

The tests SearchableSnapshotsIntegTests.testCreateAndRestoreSearchableSnapshot and FrozenSearchableSnapshotsIntegTests.testCreateAndRestorePartialSearchableSnapshot both failed once when asserting the shard folders using assertShardFolders(index, true).

The failures occurred when the original index is first closed (not deleted) and mounted again under the same name (so it will be restored as a searchable snapshot index on top of the existing shard files). The SearchableSnapshotDirectory implementation takes care to clean up the shard files on disk using SearchableSnapshotDirectory.cleanExistingRegularShardFiles() and the tests verify that the shard index folder is indeed deleted from disk on all nodes but sometime fail because the folder is still present.

I wasn't able to reproduce but I think that the closing of the original index + the creation of the .snapshot-blob-cache index trigger some shard relocations that are cancelled by the subsequent mount/restore, leaving some files on disk that should be cleaned up but maybe not immediately.

This pull request changes the tests to assertBusy() when verifying the shard folders and also adds more logging information in case waiting for the assertShardFolders(index, true) is not enough.

Closes #77831

elasticmachine · 2021-11-04T13:03:29Z

Pinging @elastic/es-distributed (Team:Distributed)

tlrx · 2021-11-04T13:03:53Z

...t/java/org/elasticsearch/xpack/searchablesnapshots/BaseSearchableSnapshotsIntegTestCase.java

@@ -75,7 +77,7 @@ protected boolean addMockInternalEngine() {

    @Override
    protected Collection<Class<? extends Plugin>> nodePlugins() {
-        return List.of(LocalStateSearchableSnapshots.class);
+        return CollectionUtils.appendToCopy(super.nodePlugins(), LocalStateSearchableSnapshots.class);


This is independant but looks wrong so I fixed it.

arteam

LGTM. Let's see if assertBusy makes any difference

tlrx · 2021-11-05T08:49:06Z

Thanks Artem!

The tests testCreateAndRestoreSearchableSnapshot and testCreateAndRestorePartialSearchableSnapshot both failed once when asserting the shard folders using assertShardFolders(index, true). The failures occurred when the original index is first closed (not deleted) and mounted again under the same name (so it will be restored as a searchable snapshot index on top of the existing shard files). The SearchableSnapshotDirectory implementation takes care to clean up the shard files on disk using SearchableSnapshotDirectory.cleanExistingRegularShardFiles() and the tests verify that the shard index folder is indeed deleted from disk on all nodes but sometime fail because the folder is still present. I wasn't able to reproduce but I think that the closing of the original index + the creation of the .snapshot-blob-cache index trigger some shard relocations that are cancelled by the subsequent mount/restore, leaving some files on disk that should be cleaned up but maybe not immediately. This commit changes the tests to assertBusy() when verifying the shard folders and also adds more logging information in case waiting for the assertShardFolders(index, true) is not enough. Closes elastic#77831

elasticsearchmachine · 2021-11-05T08:49:50Z

💔 Backport failed

Status	Branch	Result
✅	8.0
❌	7.16	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 80341

The tests testCreateAndRestoreSearchableSnapshot and testCreateAndRestorePartialSearchableSnapshot both failed once when asserting the shard folders using assertShardFolders(index, true). The failures occurred when the original index is first closed (not deleted) and mounted again under the same name (so it will be restored as a searchable snapshot index on top of the existing shard files). The SearchableSnapshotDirectory implementation takes care to clean up the shard files on disk using SearchableSnapshotDirectory.cleanExistingRegularShardFiles() and the tests verify that the shard index folder is indeed deleted from disk on all nodes but sometime fail because the folder is still present. I wasn't able to reproduce but I think that the closing of the original index + the creation of the .snapshot-blob-cache index trigger some shard relocations that are cancelled by the subsequent mount/restore, leaving some files on disk that should be cleaned up but maybe not immediately. This commit changes the tests to assertBusy() when verifying the shard folders and also adds more logging information in case waiting for the assertShardFolders(index, true) is not enough. Closes #77831

The tests testCreateAndRestoreSearchableSnapshot and testCreateAndRestorePartialSearchableSnapshot both failed once when asserting the shard folders using assertShardFolders(index, true). The failures occurred when the original index is first closed (not deleted) and mounted again under the same name (so it will be restored as a searchable snapshot index on top of the existing shard files). The SearchableSnapshotDirectory implementation takes care to clean up the shard files on disk using SearchableSnapshotDirectory.cleanExistingRegularShardFiles() and the tests verify that the shard index folder is indeed deleted from disk on all nodes but sometime fail because the folder is still present. I wasn't able to reproduce but I think that the closing of the original index + the creation of the .snapshot-blob-cache index trigger some shard relocations that are cancelled by the subsequent mount/restore, leaving some files on disk that should be cleaned up but maybe not immediately. This commit changes the tests to assertBusy() when verifying the shard folders and also adds more logging information in case waiting for the assertShardFolders(index, true) is not enough. Closes elastic#77831

The tests testCreateAndRestoreSearchableSnapshot and testCreateAndRestorePartialSearchableSnapshot both failed once when asserting the shard folders using assertShardFolders(index, true). The failures occurred when the original index is first closed (not deleted) and mounted again under the same name (so it will be restored as a searchable snapshot index on top of the existing shard files). The SearchableSnapshotDirectory implementation takes care to clean up the shard files on disk using SearchableSnapshotDirectory.cleanExistingRegularShardFiles() and the tests verify that the shard index folder is indeed deleted from disk on all nodes but sometime fail because the folder is still present. I wasn't able to reproduce but I think that the closing of the original index + the creation of the .snapshot-blob-cache index trigger some shard relocations that are cancelled by the subsequent mount/restore, leaving some files on disk that should be cleaned up but maybe not immediately. This commit changes the tests to assertBusy() when verifying the shard folders and also adds more logging information in case waiting for the assertShardFolders(index, true) is not enough. Closes #77831

Relates elastic#80341 Closes 84158

…er (#84942) Relates #80341 Closes 84158

Wait for shard index folder to be cleaned up

4a4c450

tlrx added >test Issues or PRs that are addressing/adding tests :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.16.0 v8.1.0 labels Nov 4, 2021

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Nov 4, 2021

tlrx commented Nov 4, 2021

View reviewed changes

tlrx requested review from arteam and original-brownbear November 4, 2021 14:12

arteam approved these changes Nov 4, 2021

View reviewed changes

tlrx added the auto-backport-and-merge label Nov 5, 2021

tlrx merged commit c633615 into elastic:master Nov 5, 2021

tlrx deleted the assert-busy-hard-folders branch November 5, 2021 08:48

tlrx mentioned this pull request Nov 5, 2021

[8.0] Wait for shard index folder to be cleaned up (#80341) #80393

Merged

tlrx mentioned this pull request Nov 5, 2021

Wait for shard index folder to be cleaned up (#80341) #80402

Merged

mark-vieira added v8.0.0-rc1 and removed v8.0.0 labels Jan 12, 2022

arteam added a commit to arteam/elasticsearch that referenced this pull request Mar 14, 2022

Increase timeout for waiting for the cleanup of the shared index folder

8b36133

Relates elastic#80341 Closes 84158

arteam mentioned this pull request Mar 14, 2022

Increase timeout for waiting for the cleanup of the shared index folder #84942

Merged

arteam added a commit that referenced this pull request Mar 18, 2022

Increase timeout for waiting for the cleanup of the shared index fold…

b0ab939

…er (#84942) Relates #80341 Closes 84158

arteam mentioned this pull request Feb 22, 2024

[CI] SearchableSnapshotsIntegTests testCreateAndRestoreSearchableSnapshot failing #105202

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SearchableSnapshotsIntegTests should wait for shard index folder to be cleaned up #80341

SearchableSnapshotsIntegTests should wait for shard index folder to be cleaned up #80341

tlrx commented Nov 4, 2021

elasticmachine commented Nov 4, 2021

tlrx Nov 4, 2021

arteam left a comment

tlrx commented Nov 5, 2021

elasticsearchmachine commented Nov 5, 2021

SearchableSnapshotsIntegTests should wait for shard index folder to be cleaned up #80341

SearchableSnapshotsIntegTests should wait for shard index folder to be cleaned up #80341

Conversation

tlrx commented Nov 4, 2021

elasticmachine commented Nov 4, 2021

tlrx Nov 4, 2021

Choose a reason for hiding this comment

arteam left a comment

Choose a reason for hiding this comment

tlrx commented Nov 5, 2021

elasticsearchmachine commented Nov 5, 2021

💔 Backport failed